Git Internals (How Git Works Under the Hood)
This article explains Git internals with one goal: make Git predictable.
You are not expected to memorize commands or file paths. You are expected to understand how Git thinks.
Once this clicks, merge, rebase, reset, and recovery stop feeling scary.
The .git directory
The .git directory is the Git repository.
Everything Git knows about your project lives here.
Your working directory contains files.
The .git directory contains history.
If .git is deleted:
- all commits are lost
- all branches are lost
- Git cannot recover history
Your files may still exist, but Git tracking is permanently gone.
Important contents inside .git:
objects/→ actual datarefs/→ branch pointersHEAD→ current positionindex→ staging area
Git objects: how Git stores data
Git stores everything as objects. Objects are immutable.
This means:
- objects are never modified
- new objects are created instead
There are three core object types.
Blob objects (file content)
A blob stores only file content.
It does not store:
- file name
- directory location
- permissions
If two files have identical content, Git stores only one blob.
This is why Git is space efficient.
Tree objects (directory structure)
A tree represents a directory.
A tree maps:
- file names → blob objects
- subdirectories → other tree objects
Trees describe structure. Blobs describe content.
Together, they represent a snapshot.
Commit objects (snapshots)
A commit object represents a snapshot in time.
A commit contains:
- a pointer to a tree (the snapshot)
- parent commit(s)
- author and committer information
- commit message
A commit does not store files. It points to trees, which point to blobs.
This indirection is intentional.
Why commits are immutable
Every Git object is identified by a hash. The hash is calculated from the object content.
If anything inside the object changes:
- the hash changes
- Git treats it as a new object
This guarantees:
- history cannot be silently modified
- corruption is detectable
- commits are trustworthy
This is the foundation of Git safety.
Branches: just pointers
A branch is not a copy of code. A branch is a pointer to a commit.
When you create a branch:
- no files are duplicated
- no commits are copied
- only a pointer is created
When you commit:
- Git creates a new commit object
- the current branch pointer moves forward
Other branches are unaffected.
This is why branches are cheap and fast.
HEAD: where you are right now
HEAD represents your current position.
Usually:
HEAD→ branch → commit
This means:
- commits move the branch
- HEAD follows the branch
Detached HEAD explained
Detached HEAD means:
- HEAD points directly to a commit
- no branch points to that commit
You can:
- inspect files
- run builds
- experiment safely
But:
- new commits will be lost unless you create a branch
Detached HEAD is safe for inspection, not for long-term work.
The staging area (index)
The staging area is stored in .git/index.
It sits between working directory and repository.
The staging area decides:
- what goes into the next commit
- what stays out
This allows:
- partial commits
- clean commit history
- intentional changes
Only staged content is committed.
This is why git add exists.
The three Git areas (important)
Git always works with three areas:
-
Working directory
Files you edit. -
Staging area (index)
Files prepared for commit. -
Repository
Committed history.
Commands move data between these areas. Understanding this removes confusion.
What git commit really does
When you run git commit:
- Git reads the staging area
- Git creates blob objects if needed
- Git creates tree objects
- Git creates a commit object
- Git moves the branch pointer
Nothing is overwritten. Everything is additive.
Why rebase rewrites history
Rebase does not move commits. It creates new commits.
During rebase:
- original commits are abandoned
- new commits are created with new hashes
- branch pointer moves to new commits
Old commits still exist temporarily, but become unreachable.
This is why rebasing shared branches is dangerous.
Why merge is safe
A merge commit has multiple parents.
This records:
- both histories
- where they joined
Nothing is rewritten. No commit hashes change.
Merge preserves reality. Rebase rewrites the story.
Why Git can recover lost commits
Git does not immediately delete objects.
Unreachable commits remain until:
- garbage collection runs
- retention period expires
This is why:
git reflogcan recover mistakes- accidental resets are often reversible
Git prefers safety over cleanup.
Why Git is fast
Git is fast because:
- operations are local
- objects are content-addressed
- history is immutable
Most commands never touch the network.
This is why Git scales well.
Why internals matter
Understanding internals helps you:
- predict Git behavior
- choose merge vs rebase correctly
- recover from mistakes calmly
- stop guessing
Git is not complex. It is strict.
Once the model is clear, Git becomes boring — and that is good.